Skip to content

HIVE-27006: Fix ParallelEdgeFixer#4043

Merged
deniskuzZ merged 10 commits intoapache:masterfrom
ngsg:HIVE-27006-Fix-ParallelEdgeFixer
Nov 23, 2023
Merged

HIVE-27006: Fix ParallelEdgeFixer#4043
deniskuzZ merged 10 commits intoapache:masterfrom
ngsg:HIVE-27006-Fix-ParallelEdgeFixer

Conversation

@ngsg
Copy link
Contributor

@ngsg ngsg commented Feb 8, 2023

What changes were proposed in this pull request?

ParallelEdgeFixer refers to RowSchema when inverting columns and updates RuntimeValueInfo as well as SemiJoinBranchInfo.

Why are the changes needed?

Current ParallelEdgeFixer does not update RuntimeValueInfo while SemiJoinBranchInfo is updated.
Since TezCompiler refers to RuntimeValueInfo when adding SemiJoin edges into a Tez DAG, the inconsistency between RuntimeValueInfo and SemiJoinBranchInfo leads to the absence of SemiJoin edge in Tez runtime.

Another problem of ParallelEdgeFixer is incorrect result of colMappingInverseKeys().
In current implementation, colMappingInverseKeys() depends on Operator.getColumnExprMap(), but I found that this method sometimes returns an empty map although the Operator contains some columns. (Also the comment of this method says that it returns only key columns for RS and GBY operators.)
When this happens, ParallelEdgeFixer inserts a SEL operator without any column, and its child RS operator eventually fails due to Runtime error like below message.

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col0 from []
        at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:384)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)

Does this PR introduce any user-facing change?

No

How was this patch tested?

I tested the patch manually on cluster using the query described in JIRA and TPC-DS queries.

@ngsg ngsg changed the title Fix ParallelEdgeFixer HIVE-27006: Fix ParallelEdgeFixer Feb 8, 2023
@ngsg ngsg force-pushed the HIVE-27006-Fix-ParallelEdgeFixer branch from 5d4ff36 to 58f970d Compare February 16, 2023 10:03
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
No Duplication information No Duplication information

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 10 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Jul 24, 2023
@github-actions github-actions bot closed this Aug 1, 2023
@ayushtkn ayushtkn reopened this Sep 16, 2023
set hive.exec.max.dynamic.partitions.pernode=4000;
set hive.exec.max.dynamic.partitions=10000;
set hive.exec.parallel.thread.number=32;
set hive.exec.parallel=false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to set all these irrelevant options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed irrelevant configurations

Map 14 <- Map 6 (BROADCAST_EDGE), Union 12 (CONTAINS)
Map 5 <- Map 6 (BROADCAST_EDGE), Union 2 (CONTAINS)
Map 6 <- Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
Map 6 <- Reducer 10 (BROADCAST_EDGE), Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems odd to me: I wonder why the sudden need of Reducer 10 for Map 6 ?

  • there are no changes in Map 6
  • Map 6 refers only to RS_47 and RS_51 ; so its odd that it has 3 Reducers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current ParallelEdgeFixer does not update RuntimeValueInformation(RVI) correctly. Because TezCompiler creates SemiJoin edges based on RVI, this issue leads to absence of some edges.

The edge between Map6 and Reducer10 is one of the disappeared edge. After SWO, Map6 has 2 incoming SemiJoin edges that come from the same reducer. So PEF inserts SEL-RS in order to prevent parallel edge, but it does not update RVI of the parent of the inserted SEL-RS. That's why previous plan does not contain an edge between Map6 and Reducer10.

I attached 3 operator graphs for the sake of your better understanding. All graphs are generated during TPCDS30TB-query2 test.
Before applying PEF:
master before dot 1

After applying current PEF:
master after dot 1

After applying modified PEF:
h27006 after dot 1

Comment on lines +1919 to +1923
boolean notTraverseable = !traverseableEdgeTypes.contains(opEdge.getEdgeType());
boolean notInvertible = (s instanceof ReduceSinkOperator) &&
!ParallelEdgeFixer.colMappingInverseKeys((ReduceSinkOperator) s).isPresent();

return notTraverseable || notInvertible;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to not change something which is not broken...

the previous version was eagerly avoiding to call PEF#colMIK in case the edge type was not matching - what if for some reason it starts throwing exceptions for irrelevant cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted it.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, pendind tests
@ngsg, thank you for the contribution!

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 7 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

@deniskuzZ deniskuzZ merged commit 5861b16 into apache:master Nov 23, 2023
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants